Speaker and Noise Factorisation for Robust Speech Recognition
نویسنده
چکیده
Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the underlying “clean” speech signal. This paper examines techniques to handle both speaker and background noise differences. An acoustic factorisation approach is adopted. Here separate transforms are assigned to represent the speaker (maximum likelihood linear regression (MLLR)), and noise and channel (model-based vector Taylor series (VTS)) factors. This is a highly flexible framework compared to the standard approaches of modelling the combined impact of both speaker and noise factors. For example factorisation allows the speaker characteristics obtained in one noise condition to be applied to a different environment. To obtain this factorisation modified versions of MLLR and VTS training and application are derived. The proposed scheme is evaluated for both adaptation and factorisation on the AURORA4 data.
منابع مشابه
Noise robust speaker recognition with convolutive sparse coding
Recognition and classification of speech content in everyday environments is challenging due to the large diversity of realworld noise sources, which may also include competing speech. At signal-to-noise ratios below 0 dB, a majority of features may become corrupted, severely degrading the performance of classifiers built upon clean observations of a target class. As the energy and complexity o...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملGroup Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition
Spectrogram factorisation using a dictionary of spectrotemporal atoms has been successfully employed to separate a mixed audio signal into its source components. When atoms from multiple sources are included in a combined dictionary, the relative weights of activated atoms reveal likely sources as well as the content of each source. Enforcing sparsity on the activation weights produces solution...
متن کاملAn explicit independence constraint for factorised adaptation in speech recognition
Speech signals are usually affected by multiple acoustic factors, such as speaker characteristics and environment differences. Usually, the combined effect of these factors is modelled by a single transform. Acoustic factorisation splits the transform into several factor transforms, each modelling only one factor. This allows, for example, estimating a speaker transform in a noise condition and...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کامل